Austrian Hotels Dataset

Overview

This dataset contains realistic simulated data on hotels across Austria, designed for practicing data wrangling and table joins. The dataset consists of multiple related tables that can be combined using various join operations.

Used in: Week 4 (Joining Tables)

Generated by: Claude AI (Sonnet 3.7) with realistic relationships between variables

Dataset Structure

The dataset includes 8 related tables with hotels across Austrian cities, covering occupancy, pricing, tourism statistics, and economic indicators.

Core Tables

File	Description	Rows	Key Columns
`hotels.csv`	Basic hotel information	200	`hotel_id` (PK)
`cities.csv`	City information	10	`city` (PK)
`monthly_occupancy.csv`	Monthly hotel performance metrics	~3,800	`hotel_id`, `month`, `year`
`city_tourism.csv`	Monthly tourism statistics by city	240	`city`, `month`, `year`
`economic_indicators.csv`	Monthly economic indicators	24	`month`, `year`
`reviews.csv`	Hotel guest reviews	~1,700	`review_id` (PK), `hotel_id` (FK)
`amenities.csv`	List of possible hotel amenities	10	`amenity_id` (PK)
`hotel_amenities.csv`	Hotel-amenity relationships	~1,000	`hotel_id`, `amenity_id`

Key Relationships

One-to-One: Hotels ↔︎ Cities (through city name)
One-to-Many: Hotels → Monthly Occupancy, Hotels → Reviews
Many-to-Many: Hotels ↔︎ Amenities (through hotel_amenities)
Composite Keys: Monthly data requires (hotel_id, month, year) or (city, month, year)

Documentation

hotel-data-readme.md - Detailed schema documentation with column descriptions and data types

Learning Objectives

This dataset allows students to practice: - Inner, left, right, and full joins - One-to-one and one-to-many relationships - Composite key joins - Data aggregation after joins - Handling missing values in joins

Sample Research Questions

How do hotel prices vary by city and season?
What’s the relationship between amenities and guest ratings?
How do economic indicators affect hotel occupancy rates?
Which cities have the highest tourism-to-hotel capacity ratios?